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ABSTRACT 



This issue of NISE Brief discusses the weakness of the most 
commonly used educational outcome indicators- -average and median test scores 
and prof iciency- level indicators- -and the advantages of value-added 
indicators. It offers a critique of the average test score as a measure of 
school and program performance as an example based on national data. 
Value-added indicators as data requirements are also discussed. (Contains 13 
references.) (ASK) 
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States and districts 
are increasingly 
turning to school 
accountability as 
an instrument 
of reform. 
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E ducational outcome indicators frequently 
are used to measure the performance of 
schools, programs, and policies. Reliance 
on such indicators is largely the result of a grow- 
ing demand to hold these entities accountable for 
their performance, defined in terms of outcomes, 
such as standardized test scores in mathematics, 
science and reading, rather than inputs, such as 
teacher qualifications, class size, or the quality of / 
lab facilities. This Brief discusses the weaknesses' 
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of the most commonly used educational outcome 
indicators — average and median test scores and 
proficiency-level indicators — and the advantages 
of value-added indicators . 1 Several major conclu- 
sions emerge from the analysis. 

First, the most common educational indi- 
cators are highly flawed as measures of school 
and program performance, even if they are 
derived from highly valid assessments. As a 
result, they are of limited value, if not useless, 
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1973 to 1982 and then partial recovery 
between 1982 and 1986. The eleventh- 
grade data, by themselves, are fully consis- 
tent with the premise that academic 
reforms in the early and mid 1 980s gener- 
ated substantial gains in academic 
achievement. In fact, an analysis of the 
data based on a gain indicator (a value- 
added type indicator) rather than an 
average test score suggests the opposite 
conclusion — see Panel B of Table 1 . 

The gain indicator is similar to a true 
value-added indicator in that it controls 
for differences among students in prior 
achievement. It does so in a very simple 
and intuitive way: gain is the change in 
average test scores over time (and across 
grades) for the same cohort of students. 
For example, the gain in test scores for 
students who were eleventh-grade stu- 
dents in 1986 is given by average test 
score of eleventh-grade students in 1986 
minus the average test score for seventh- 
grade students in 1982 (four grades and 
four years earlier) (that is, 302.0 - 268.6 
= 33.4). Unfortunately, the gain indica- 
tor, unlike the value-added indicator, 
does not control for differences in 
student, family, and neighborhood char- 
acteristics that contribute to growth in 
student achievement. As a result, the gain 
indicator reflects possible changes over 
time in the composition of the popula- 
tion as well as changes in school produc- 
tivity. 4 Nonetheless, it is instructive to 



compare the gains in achievement experi- 
enced by different cohorts. 5 

As indicated in Panel B, the achieve- 
ment growth of high school students 
(from seventh to eleventh grade) during 
the 1982 and 1986 period was actually 
no better than achievement growth 
during previous periods. In fact, the gain 
from seventh to eleventh grade was actu- 
ally slightly lower during the 1982 to 
1986 period than in previous periods! 
The rise in eleventh-grade math scores 
from 1982 to 1986 stems from an earlier 
increase in achievement growth for that 
cohort rather than from an increase in 
achievement growth over grades seven to 
eleven. In short, these data provide no 
support for the notion that high school 
academic reforms generated significant 
increases in test scores during the mid- 
1980s. These data also vividly confirm 
the general superiority of the gain indica- 
tor, relative to level indicators such as the 
average test score, as a measure of educa- 
tional productivity. 

It would be interesting to report the 
above analysis using true value-added as 
opposed to gain indicators. Unfortu- 
nately, the NAEP data do not permit 
such an analysis to be conducted, since 
the same students are not sampled for 
two consecutive NAEP surveys. This 
weakness in NAEP data could be reme- 
died by switching to a survey design that 
was at least partially longitudinal. 



Value-Added Indicators: 

Data Requirements 

Given the problems that exist with the 
average test score and other level indica- 
tors and, to a lesser degree, the gain indi- 
cator, it is important to consider whether 
value-added indicators could potentially 
be used as the primary tool for evaluating 
the performance of schools and pro- 
grams. There are at least two reasons to 
be optimistic in this regard. First, value- 
added models have been used extensively 
over the last three decades by evaluators 
and other researchers interested in educa- 
tion and training programs. Second, a 
number of districts and states, including 
Dallas, Minneapolis, South Carolina, 
and Tennessee, have successfully imple- 
mented value-added indicator systems. 6 

Nonetheless, despite the promise of 
value-added indicator systems, it is clear 
that they require a major commitment. 
In particular, districts and states must be 
prepared to (1) assess students frequendy 
and (2) develop comprehensive district 
or state data systems that contain infor- 
mation on student test scores and 
student, family, and community charac- 
teristics. The need for frequent testing 
stems from the fact that value-added 
indicators are designed to measure the 
contribution of schools to growth in 
student achievement over a given time 
period. In order to be able to construct 



Table 1. NAEP Mathematics Examination Data 

(A) Average Test Scores by Year (B) Average Test Score Gain From Year to Year for Each Cohort 



GRADE 


1978 


1978 


1982 


1986 


3rd 


219.1 


218.6 


219.0 


221.7 


7th 


266.0 


264.1 


268.6 


269.0 


11th 


304.4 


300.4 


298.5 


302.0 



GRADE 


1973 to 1978 


1978 to 1982 


1982 to 1986 


3rd to 7th 


45.0 


50.0 


50.0 


7th to 11th 


34.4 


34.4 


33.4 



Source: Dosseyetal. (1988). 
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Student and family characteristics also contribute to student achievement. 



value-added (or gain) indicators 
it is therefore necessary to have 
achievement data for the same 
individuals at two points in time. 
Students who are missing either 
pre- or posttest data must be 
excluded from the analysis and 
thus from a districts accountabil- 
ity and/or evaluation system. 

From the perspective of mea- 
suring school and program 
performance, an ideal testing 
program would do the following: 

• Test all students annually 
during the late spring. Many 
districts currendy follow this 
practice. 

• Test all students who attend 
summer school at the end of 
the summer (or in the fall at 
the beginning of the subse- 
quent school year). Follow- 
ing the recent boom in 
summer school enrollments, 
many districts have begun 
tesdng students at the end of 
summer school. 

• Test mobile students at the 
point of entry into the dis- 
trict (or into a new school in the 
district). 7 Minneapolis is one of the 
districts that is pioneering the use of 
entry-point testing. As indicated 
below, this component is very 
important in a comprehensive 
assessment program. 

Annual testing has three major 
advantages. First, it maximizes account- 
ability by localizing school and program 
performance to the most natural unit of 
accountability: the grade level or class- 
room. Second, it yields up-to-date infor- 
mation on performance. Third, it 
severely limits the number of students 
who would be excluded due to student 
mobility and, as a result, yields a data set 
that is likely to be highly representative of 
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the school population as a whole and 
large enough to yield statistically reliable 
school performance estimates. On the 
other hand, less frequent testing, say 
testing at grades kindergarten, 4, 8, and 
1 2, might be acceptable for national pur- 
poses, since student mobility is not really 
an issue at the national level. For pur- 
poses of evaluating local school and 
program performance, however, the 
problems created by student mobility 
argue strongly for frequent testing. 

Adding a post-summer-school test 
yields one additional advantage; namely, 
it allows districts to separately evaluate 
the productivity of programs during the 
regular school year and those during the 
summer. 8 Adding a point-of-entry test 
for in-migrant students enables districts 
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to evaluate the degree to which 
mobile students experience 
growth in achievement that is 
comparable to that of nonmobile 
students. Furthermore, it allows 
these students to be included in 
state and district performance 
indicators. 9 When schools are 
increasingly under pressure 
to achieve high (measured) 
performance, adopting an indi- 
cator/evaluation system that 
systematically excludes any group 
in the population seems particu- 
larly unwise. 

One potential obstacle to pro- 
ducing high-quality value-added 
indicators is the difficulty of 
collecting extensive information 
on student and family characteris- 
tics. These data are required as 
“control variables” in value-added 
models. In most schools the fol- 
lowing data are typically available 
from administrative records: race 
and ethnicity, gender, special edu- 
cation status, limited English pro- 
ficiency (LEP) status, eligibility 
for free or reduced-price lunch, 
and whether a family receives welfare 
benefits. Supplemental surveys of stu- 
dents and parents may be used to collect 
other information, such as parental 
education and income and family atti- 
tudes toward education (variables known 
to be powerful determinants of student 
achievement growth). 

The consequence of failing to control 
adequately for student, family, and com- 
munity characteristics is that value-added 
indicators may be contaminated if there 
are major differences across schools and 
programs in unmeasured (uncontrolled) 
student, family, and community charac- 
teristics. Thus, value-added indicators 
derived from models with “weak” predic- 
tors of student achievement growth 
might be only slightly better than gain 
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A value-added approach to school accountability is useful and possible. 



indicators (better in the sense of being 
more highly correlated with a theoreti- 
cally perfect value-added indicator). Even 
so, they are likely to be much better indi- 
cators than average test scores. The key 
issue, of course, is not whether a particu- 
lar value-added indicator is perfect. 
Rather, the issue is whether the indicator 
provides a substantially better measure of 
school and program performance than 
other affordable indicators. 

The cost of implementing an assess- 
ment system that is sufficient to support 
value-added (or gain) indicators is obvi- 
ously higher than an assessment system 
that tests students only in selected grades 
(say, 4, 8, and 12). The thrust of this 
Brief is that an assessment system with 
infrequent testing is unlikely to produce 
outcome indicators that are valid for the 
purpose of measuring school perfor- 
mance. Thus, a district that is unwilling 
or unable to support the expense of 
frequent assessment should be very wary 
of using the achievement data that it does 
collect to evaluate the performance of 
schools and programs. 

Conclusions and 
Recommendations 

Average and median test scores and profi- 
ciency-level indicators, the most com- 
monly used indicators in American 
education, are highly suspect as indicators 
of school and program performance. These 
indicators suffer from four major deficien- 
cies: they fail to localize performance to the 
classroom or grade level; they aggregate 
information on performance that tends to 
be grossly out of date; they are contami- 
nated by student mobility; and they fail to 
measure the distinct contribution of 
schools and programs to growth in student 
achievement as separate from the contribu- 
tion due to student, family, and commu- 
nity factors. As a result, they are flawed 
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measures for evaluation purposes and are 
weak, if not counterproductive, instru- 
ments of public accountability. 

The gain indicator (the change in 
average test scores from grade to grade for 
the same cohort of students) and the 
value-added indicator (the gain indicator 
statistically adjusted for differences across 
schools and programs in the type of stu- 
dents served) avoid the first of these four 
problems. In addition, the value-added 
indicator potentially eliminates the bias 
that exists in the gain indicator due to 
differences across schools in student, 
family, and community characteristics, 
particularly if it is based on a model that 
includes an extensive set of control vari- 
ables. In this case, it fully eliminates the 
incentive for schools to cream. 

The value-added approach to mea- 
suring school and program performance 
relies on a statistical model to identify the 
distinct contributions made by schools 
and programs to growth in student 
achievement. The quality of a value- 
added indicator is determined by four 
factors: the frequency with which stu- 



dents are tested, the quality and appro- 
priateness of the tests that underlie the 
indicators, the adequacy of the control 
variables included in the value-added 
models, and the appropriateness (valid- 
ity) of the statistical model used to used 
to define the indicator. In terms of the 
first factor, states and districts need to 
seriously consider testing students at 
every grade level, beginning with kinder- 
garten; to further improve their indicator 
systems, states and districts need to think 
about testing summer school students 
and in-migrant students at the point of 
entry into the school or district. With 
respect to the second and third issues, it 
is important that states and districts 
make it a major priority to collect exten- 
sive and reliable information on student 
and family characteristics and to develop 
state tests that are technically sound and 
fully attuned to their educational goals. 
Finally, ongoing research is needed to 
assess the sensitivity of estimates of school 
and program performance to alternative 
statistical models and alternative sets of 
control variables. 
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